camera view
EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
Unsupervised Domain Adaptation has been an efficient approach to transferring the semantic segmentation model across data distributions. Meanwhile, the recent Open-vocabulary Semantic Scene understanding based on large-scale vision language models is effective in open-set settings because it can learn diverse concepts and categories. However, these prior methods fail to generalize across different camera views due to the lack of cross-view geometric modeling. At present, there are limited studies analyzing cross-view learning. To address this problem, we introduce a novel Unsupervised Cross-view Adaptation Learning approach to modeling the geometric structural change across views in Semantic Scene Understanding. First, we introduce a novel Cross-view Geometric Constraint on Unpaired Data to model structural changes in images and segmentation masks across cameras. Second, we present a new Geodesic Flow-based Correlation Metric to efficiently measure the geometric structural changes across camera views. Third, we introduce a novel view-condition prompting mechanism to enhance the view-information modeling of the open-vocabulary segmentation network in cross-view adaptation learning. The experiments on different cross-view adaptation benchmarks have shown the effectiveness of our approach in cross-view modeling, demonstrating that we achieve State-of-the-Art (SOTA) performance compared to prior unsupervised domain adaptation and open-vocabulary semantic segmentation methods.
Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery
Kato, Yuna, Mori, Shohei, Saito, Hideo, Takatsume, Yoshifumi, Kajita, Hiroki, Isogawa, Mariko
Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- Europe > Germany (0.04)
- Asia > Japan (0.04)
- Questionnaire & Opinion Survey (0.94)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Gavryushin, Alexey, Wang, Xi, Malate, Robert J. S., Yang, Chenyu, Liconti, Davide, Zurbrügg, René, Katzschmann, Robert K., Pollefeys, Marc
Large-scale egocentric video datasets capture diverse human activities across a wide range of scenarios, offering rich and detailed insights into how humans interact with objects, especially those that require fine-grained dexterous control. Such complex, dexterous skills with precise controls are crucial for many robotic manipulation tasks, yet are often insufficiently addressed by traditional data-driven approaches to robotic manipulation. To address this gap, we leverage manipulation priors learned from large-scale egocentric video datasets to improve policy learning for dexterous robotic manipulation tasks. We present MAPLE, a novel method for dexterous robotic manipulation that learns features to predict object contact points and detailed hand poses at the moment of contact from egocentric images. We then use the learned features to train policies for downstream manipulation tasks. Experimental results demonstrate the effectiveness of MAPLE across 4 existing simulation benchmarks, as well as a newly designed set of 4 challenging simulation tasks requiring fine-grained object control and complex dexterous skills. The benefits of MAPLE are further highlighted in real-world experiments using a 17 DoF dexterous robotic hand, whereas the simultaneous evaluation across both simulation and real-world experiments has remained underexplored in prior work. We additionally showcase the efficacy of our model on an egocentric contact point prediction task, validating its usefulness beyond dexterous manipulation policy learning.
- Europe > Switzerland (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Austria (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Research Report > New Finding (0.66)
- Research Report > Promising Solution (0.48)
Enhancing UAV Search under Occlusion using Next Best View Planning
Strand, Sigrid Helene, Wiedemann, Thomas, Burczek, Bram, Shutin, Dmitriy
Search and rescue missions are often critical following sudden natural disasters or in high-risk environmental situations. The most challenging search and rescue missions involve difficult-to-access terrains, such as dense forests with high occlusion. Deploying unmanned aerial vehicles for exploration can significantly enhance search effectiveness, facilitate access to challenging environments, and reduce search time. However, in dense forests, the effectiveness of unmanned aerial vehicles depends on their ability to capture clear views of the ground, necessitating a robust search strategy to optimize camera positioning and perspective. This work presents an optimized planning strategy and an efficient algorithm for the next best view problem in occluded environments. Two novel optimization heuristics, a geometry heuristic, and a visibility heuristic, are proposed to enhance search performance by selecting optimal camera viewpoints. Comparative evaluations in both simulated and real-world settings reveal that the visibility heuristic achieves greater performance, identifying over 90% of hidden objects in simulated forests and offering 10% better detection rates than the geometry heuristic. Additionally, real-world experiments demonstrate that the visibility heuristic provides better coverage under the canopy, highlighting its potential for improving search and rescue missions in occluded environments.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Europe > Germany (0.04)
- Information Technology > Robotics & Automation (0.54)
- Aerospace & Defense > Aircraft (0.54)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.47)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Supplementary Material: M M COWS: A Multimodal Dataset for Dairy Cattle Monitoring
This document provides additional details that complement the main paper. We discuss the steps used to synchronize and calibrate the visual data in Section A. Section B elaborates on the details of UWB localization, heading direction estimation, and obtaining the reference for lying behavior. We keep the order of figures, tables, and equations in numerical, and refer to them independently from the main paper unless explicitly stated otherwise. The paper checklist is attached as the final part of the main paper. We discuss additional details of processing the visual data and calibrating four camera views.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Iowa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Food & Agriculture > Agriculture (1.00)
- Information Technology (0.93)
STREETS: A Novel Camera Network Dataset for Traffic Flow
In this paper, we introduce STREETS, a novel traffic flow dataset from publicly available web cameras in the suburbs of Chicago, IL. We seek to address the limitations of existing datasets in this area. Many such datasets lack a coherent traffic network graph to describe the relationship between sensors.
- North America > United States > Illinois > Cook County > Chicago (0.24)
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- (9 more...)
- Transportation > Ground > Road (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Information Technology (0.93)
- (2 more...)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Artificial Intelligence > Vision > Video Understanding (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)
- North America > United States > Hawaii (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)